Hadoop Reviews & Ratings 2024

Name: What is Hadoop?
Uploaded: 2012-07-14T22:23:27.000Z
Duration: 3 min 7 s
Description: What is Hadoop?

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Recent Reviews

TrustRadius Insights

December 14, 2023

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …

Hadoop: A Robust Big Data Platform

9 out of 10

April 11, 2022

Incentivized

Hadoop is being used to solve big data modeling problems in our firm. The corporate analytics team uses Hadoop to perform functions like …

Great enterprise tool for handling large data

9 out of 10

August 17, 2021

Incentivized

Apache Hadoop is one of the most effective and efficient software which has been storing and processing an extremely colossal amount of …

Good tool for unstructured data

9 out of 10

July 21, 2021

Incentivized

Apache Hadoop is an open-source software library that is designed for the collection, storage, and analysis of large amounts of data sets. …

Good solution for storing and processing large data

7 out of 10

May 20, 2021

Incentivized

We use Apache Hadoop to store and process large amounts of data (petabytes per day) across thousands of data pipelines. Hadoop works …

Apache Hadoop Can Save on the Headaches

7 out of 10

January 16, 2021

Incentivized

[Apache Hadoop] is being handled as it is (mostly) intended. For large, unstructured data management from our data flows to include …

Hadoop -- Great Value for What You Pay

7 out of 10

September 21, 2020

Incentivized

It's used organization-wide for older data that's not used as frequently. We use Teradata to warehouse our more recent data, but for data …

Fault Tolerance and High Availablility Made Easy with Hadoop

10 out of 10

September 20, 2020

Incentivized

We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or …

Hadoop vs. Alternatives

8 out of 10

June 05, 2019

Incentivized

It is being used at our Fortune 500 clients. It is great for storage, but it is not well understood by the business. The challenge is that …

Hadoop Review

7 out of 10

May 16, 2018

Incentivized

It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …

Great Option for Unstructured Data

10 out of 10

March 28, 2018

Incentivized

Used for Massive data collection, storage, and analytics
Used for MapReduce processes, Hive tables, Spark job input, and for backing up data

Hadoop is pretty Badass

9 out of 10

January 04, 2018

Incentivized

Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when …

Hadoop: Highly available, scalable and cost effective for big data storage and processing.

8 out of 10

December 13, 2017

Incentivized

Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. …

Hadoop for Justifying Business Decisions with Hard Data

10 out of 10

October 24, 2017

Incentivized

Hadoop has been an amazing development in the world of Big Data. Where relational databases fall short with regard to tuning and …

Hadoop review 2346

9 out of 10

September 22, 2017

Incentivized

Hadoop is used to build a data lake where all enterprise data for my entire company can be stored. With data centralization and …

Hadoop for Big Data

10 out of 10

August 24, 2017

Incentivized

[It was used] As a proof of concept to analyze a huge amount of data. We were building a product to analyze huge data and eventually sell …

Read all reviews

Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube

Return to navigation

Product Details

About
Tech Details
FAQs

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating Systems	Unspecified
Mobile Application	No

Frequently Asked Questions

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).

Return to navigation

Comparisons

View all alternatives

Compare with

Reviews and Ratings

(270)

December 15th 2023

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Business Problems Solved

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(1-22 of 22)

Sort By *

Companies can't remove reviews or game the system. Here's why

April 11, 2022

Hadoop: A Robust Big Data Platform

Kunal Sonalkar

Data Research Analyst

Southwest Florida Water Management District (Higher Education, 5001-10,000 employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Hadoop is being used to solve big data modeling problems in our firm. The corporate analytics team uses Hadoop to perform functions like data manipulation, information retrieval, data mapping, and statistical modeling. The business problem which it solves is the limitation of CSV/Excel files to handle more than a million rows. Hadoop allows you to process big data and also has connectivity with platforms like R Studio where you can deploy mathematical models.

Pros and Cons

Capability to collaborate with R Studio. Most of the statistical algorithms can be deployed.
Handling Big Data issues like storage, information retrieval, data manipulation, etc.
Redundant tasks like data wrangling, data processing, and cleaning are more efficient in Hadoop as the processing times are faster.

Hadoop requires intensive computational platforms like a minimum of 8GB memory and i5 processor. Sometimes the hardware does become a hindrance.
If we can connect Hadoop to Salesforce, it would be a tremendous functionality as most CRM data comes from that channel.
It will be good to have some Geo Coding features if someone wants to opt for spatial data analysis using latitudes and longitudes.

Likelihood to Recommend

Hadoop is very well suited for big data modeling problems in various industries like finance, insurance, healthcare, automobiles, CRM, etc. In every industry where you need data analysis in real time, Hadoop is a perfect fit in terms of storage, analysis, retrieval, and processing. It won't be a very good tool to perform ETL (Extract Transform Load) techniques though.

August 17, 2021

Great enterprise tool for handling large data

Chantel Moreno

Finance & Accounting Professional

Fagron (Pharmaceuticals, 1001-5000 employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Apache Hadoop is one of the most effective and efficient software which has been storing and processing an extremely colossal amount of data in my company for a long time now. The software Hadoop is primarily used for data collection of large amounts, storage as well as for analytics. From my experience, I have to say that Hadoop is extremely useful and has a reliable plus valid purpose.

Pros and Cons

The various modules sometimes are pretty challenging to learn but at the same time, it has made Hadoop easy to implement and perform.
Hadoop comprises a thoughtful file system which is called as Hadoop Distributed File System that beautifully processes all components and programs.
Hadoop is also very easy to install so this is also a great aspect of Hadoop as sometimes the installation process is so tricky that the user loses interest.
Customer support is quick.

As much as I really appreciate Hadoop there are certain cons attached to it as well. I personally think that Hadoop should work attentively towards their interactive querying platforms which in my opinion is quite slow as compared to other players available in the market.
Apart from that, a con that I have noticed is that there are many modules that exist in Hadoop so due to the higher number of modules it becomes difficult and time-consuming to learn and ace all of them.

Likelihood to Recommend

Apache Hadoop is majorly suited for companies that have large amounts of unstructured data flow like advertising and even web traffic so I feel that Hadoop is a great option when you have the extra bulk of data that is required to be stored and processed on a continuous basis. Moreover, I do recommend Hadoop but at the same time, I would also hope and suggest that the software of Hadoop gets supplemented with a faster and interactive database so that the overall querying service gets better.

July 21, 2021

Good tool for unstructured data

Peter Suter

Senior Software Engineer (GUI)

SIX (Financial Services, 1001-5000 employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Apache Hadoop is an open-source software library that is designed for the collection, storage, and analysis of large amounts of data sets. Apache Hadoop’s architecture comprises components that include a distributed file system. This is mostly used for massive data collection, analytics, and storage. Also, having consistent data can be integrated across other platforms and have one single source of truth.

Pros and Cons

Apache Hadoop has made managing large amounts of data quite easy.
The system contains a file system known as HDFS (Hadoop Distributed File System) which processes components and programs.
The parallel processing tool of this software is also a good aspect of Apache Hadoop.
It keeps interesting and reliable features and functions.
Apache Hadoop also has a store of very big data files in machines with high levels of availability.

I personally feel that Apache Hadoop is slower as compared to other interactive querying platforms. Queries can take up to hours sometimes which can be frustrating and discouraging sometimes.
Also, there are so many modules of Apache Hadoop so it takes so much more time to learn all of them. Other than that, optimization is somewhat a challenge in Apache Hadoop.

Likelihood to Recommend

Altogether, I want to say that Apache Hadoop is well-suited to a larger and unstructured data flow like an aggregation of web traffic or even advertising. I think Apache Hadoop is great when you literally have petabytes of data that need to be stored and processed on an ongoing basis. Also, I would recommend that the software should be supplemented with a faster and interactive database for a better querying service. Lastly, it's very cost-effective so it is good to give it a shot before coming to any conclusion.

September 20, 2020

Fault Tolerance and High Availablility Made Easy with Hadoop

Gene Baker

Vice President, Chief Architect, Development Manager and Software Engineer

WySTAR Global Retirement Solutions, a Wells Fargo Company (Financial Services, 10,001+ employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or node. The various modules provided with Hadoop make it easy for us to implement map-reduce and perform parallel processing on large sets of data. We have approximately 40TB of data that we run various algorithms against as we try to use the data to solve business problems and prevent fraudulent transactions.

Pros and Cons

Map-reduce
Parallel processing
Handles node failures
HDFS: distributed file system

More connectors
Query optimization
Job scheduling

Likelihood to Recommend

Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.

May 16, 2018

Hadoop Review

Kartik Chavan

Peer Educator (Tutor) & Supplemental Instructions (SI) Leader

The University of Texas at Arlington (Higher Education, 1001-5000 employees)

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of data has become quite easy since the arrival of the Hadoop environment. Our department is on verge of moving towards Spark instead of MapReduce, but for now, Hadoop is being used extensively for MapReduce purposes.

Pros and Cons

Hadoop Distributed Systems is reliable.
High scalability
Open Sources, Low Cost, Large Communities

Compatibility with Windows Systems
Security needs more focus
Hadoop lack in real time processing

Likelihood to Recommend

Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate data points.

March 28, 2018

Great Option for Unstructured Data

Bharadwaj (Brad) Chivukula

Sr. Engineering Manager/Delivery Manager

Nisum Technologies, Inc. (Retail, 10,001+ employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Used for Massive data collection, storage, and analytics
Used for MapReduce processes, Hive tables, Spark job input, and for backing up data
Storing Retail Catalog & Session data to enable omnichannel experience for customers, and a 360-degree customer insight
Having a consistent data store that can be integrated across other platforms, and have one single source of truth.

Pros and Cons

HDFS is reliable and solid, and in my experience with it, there are very few problems using it
Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
It provides High Scalability and Redundancy
Horizontal scaling and distributed architecture

Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates
Not for small data sets
Data security needs to be ramped up
Failure in NameNode has no replication which takes a lot of time to recover

Likelihood to Recommend

Less appropriate for small data sets
Works well for scenarios with bulk amount of data. They can surely go for Hadoop file system, having offline applications
It's not an instant querying software like SQL; so if your application can wait on the crunching of data, then use it
Not for real-time applications

January 04, 2018

Hadoop is pretty Badass

Verified User

Engineer in Engineering

Information Technology and Services Company, 201-500 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when various clusters fail. The Hadoop Distributed File System (HDFS) also goes a long way in helping in storing data. MapReduce and Tez, with the help of Hive of course, processes large amounts of data in a lesser time frame than expected. This helps our data warehouse to be updated with lesser resources rather than reading, processing and updating data in a relational data base.

Pros and Cons

It is cost effective.
It is highly scalable.
Failure tolerant.

Hadoop does not fit all needs.
Converting data into a single format takes time.
Need to take additional security measures to secure data.

Likelihood to Recommend

When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done.
Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files.
Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.

December 13, 2017

Hadoop: Highly available, scalable and cost effective for big data storage and processing.

Johanes Siregar

Big Data Analytics - Data Engineer

Telkomsel (Telecommunications, 1001-5000 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.

Pros and Cons

Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.

User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
Multiple application versioning on a single cluster would be a nice to have feature.
Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.

Likelihood to Recommend

Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.

August 24, 2017

Hadoop for Big Data

Vinay Suneja

Senior Consultant Level II

Protiviti (Utilities, 201-500 employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

[It was used] As a proof of concept to analyze a huge amount of data. We were building a product to analyze huge data and eventually sell that product to a utility.

Pros and Cons

Highly Scalable Architecture
Low cost
Can be used in a Cloud Environment
Can be run on commodity Hardware
Open Source

Its open source but there are companies like hortonworks, Cloudera etc., which give enterprise support
Lots of scripting still needed
Some tools in the hadoop eco system overlap

Likelihood to Recommend

To analyze a huge quantity of data at a low cost. It is definitely the future.
Machine learning with Spark is also a good use case.
You can also use AWS - EMR with S3 to store a lot of data with low cost.

June 03, 2016

A newbie's look at Hadoop

Mark Gargiulo

Senior Automation Engineer

NTENT (Computer Software, 51-200 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We needed a robust/redundant system to run multiple simultaneous jobs for our ETL pipeline, this needed distributed storage space, integration with Windows AD user accounts and the ability to expand when needed with little to no downtime.
We are using Cloudera 5.6 to orchestrate the install (along with puppet) and manage the hadoop cluster.

Pros and Cons

The distributed replicated HDFS filesystem allows for fault tolerance and the ability to use low cost JBOD arrays for data storage.
Yarn with MapReduce2 gives us a job slot scheduler to fully utilize available compute resources while providing HA and resource management.
The hadoop ecosystem allows for the use of many different technologies all using the same compute resources so that your spark, samza, camus, pig and oozie jobs can happily co-exist on the same infrastructure.

Without Cloudera as a management interface the hadoop components are much harder to manage to ensure consistency across a cluster.
The calculations of hardware resources to job slots/resource management can be quite an exercise in finding that "sweet spot" with your applications, a more transparent way of figuring this out would be welcome.
A lot of the roles and management pieces are written in java, which from an administration perspective can have there own issues with garbage collection and memory management.

Likelihood to Recommend

Hadoop is not for the faint of heart and is not a technology per se but an ecosystem of disparate technologies sitting on top of HDFS. It is certainly powerful but if, like me, you were handed this with no prior knowledge or experience using or administering this ecosystem the learning curve can be significant and ongoing having said that I don't think currently there are many other opensource technologies that can provide the flexibility in the "big data" arena especially for ETL or machine learning.

May 25, 2016

Hadoop is the Perfect Enterprise tool for Big Data

Tom Thomas

Student Lab Instructor (SLI) for Computer Science II

Rochester Institute of Technology (Higher Education, 1001-5000 employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

The company I worked at used Hadoop clusters for processing huge datasets. They had several nodes for both production and per-production nodes. It allowed distributed processing of data across several clusters with an easy to use software model. It is used by the Systems and IT department at my company.

Pros and Cons

HDFS provides a very robust and fast data storage system.
Hadoop works well with generic "commodity" hardware negating the need for expensive enterprise grade hardware.
It is mostly unaffected by system and hardware failures of nodes and is self-sustained.

While its open source nature provides a lot of benefits, there are multiple stability issues that arise due to it.
Limited support for interactive analytics.

Likelihood to Recommend

Hadoop is a very powerful tool that can be used in almost any environment where huge scale processing of data across clusters is required. It provides multiple modules such as HDFS and MapReduce that will make managing and analyzing said data reliable and efficient. Hadoop is a new and constantly evolving tool, and hence it needs users to be on top of it all the time.

December 09, 2015

Hadoop - best data optimization for the Enterprise

Verified User

Engineer in Engineering

Computer Software Company, 51-200 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

My organization uses Apache Hadoop for log analysis/data mining of data fetched from different practices in the US, Canada and India. It uses this data for showing analytical graphs and the progress of our software in those regions. Data from the practices is optimized and consumed by the customer applications. It provides faster performance and ease for data usage.

Pros and Cons

Hadoop is a very cost effective storage solution for businesses’ exploding data sets.
Hadoop can store and distribute very large data sets across hundreds of servers that operate, therefore it is a highly scalable storage platform.
Hadoop can process terabytes of data in minutes and faster as compared to other data processors.
Hadoop File System can store all types of data, structured and unstructured, in nodes across many servers

For now, Hadoop is doing great and is very productive.

Likelihood to Recommend

Hadoop is well suited for healthcare organizations that deal with huge amounts of data and optimizing data.

December 01, 2015

Hadoop - Effective tool for large scale distributed processing.

Mrugen Deshmukh

Senior Software Engineer

San Jose State University (Computer Software, 51-200 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I have used Hadoop for building business feeds for a telecom client. The major purpose for using Hadoop was to tackle the problem of gaining insights into the ever growing number of business data. We leveraged the map reduce programming model to churn more than 30 gigabytes of data per day into actionable and aggregated data which was further leveraged by campaign teams to design and shape marketing and by product teams to envision new customer experiences.

Pros and Cons

Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.

Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
Hadoop cannot be used for running interactive jobs or analytics.

Likelihood to Recommend

1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs.
2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario.
3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.

December 01, 2015

Hadoop the solution to big data problems

Sudhakar Kamanboina

Software Engineer

VMware (Computer Software, 10,001+ employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Hadoop is used by data center management team. Hadoop processes the metric data pushed by virtual machines. Hadoop's output is served to the analytics engine and respective actions are taken to maintain even load on machines.

Pros and Cons

Processing huge data sets.
Concurrent processing.
Performance increases with distribution of data across multiple machines.
Better handling of unstructured data.
Data nodes and processing nodes

Make Haadop lighweight.
Installation is very difficult. Make it more user friendly.
Introduce a feature that works with continuous integration.

Likelihood to Recommend

Ask about how Hadoop fits in your environment and how fast it processes streaming data.

December 01, 2015

Fast and Reliable, Use Hadoop!

Gaurav Kasliwal

Software Development Engineer

Cisco (Computer Software, 11-50 employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I have been using Hadoop for 2 years and I really find it very useful, especially working with bigger datasets. I have used Hadoop and Mahout for my project to analyze and learn different patterns from Yelp Dataset. It was really very easy and user friendly to use.

Pros and Cons

Scalability. Hadoop is really useful when you are dealing with a bigger system and you want to make your system scalable.
Reliable. Very reliable.
Fast, Fast Fast!!! Hadoop really works very fast, even with bigger datasets.

Development tools are not that easy to use.
Learning curve can be reduced. As of now, some skill is a must to use Hadoop.
Security. In today's world, security is of prime importance. Hadoop could be made more secure to use.

Likelihood to Recommend

Hadoop is really useful for larger datasets. It is not very useful when you are dealing with a smaller dataset.

December 01, 2015

From the experience of a naive developer!

Verified User

Engineer in Engineering

Internet Company, 10,001+ employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I used Hadoop for my academic projects for processing high volume data of my data mining project.

Pros and Cons

It was able to map our data with clear distinction based on the key.
We were able to write simple map reduce code which ran simultaneously on multiple nodes.
The auto heal system was really helpful in case of multiple failures.

I think Hadoop should not have single point of failure in terms of name node.
It should have good public facing API's for easy integration.
Internals of Hadoop are very abstract.
Protoco Buffers is a really good concept but I am not sure if we have checked other options as well.

Likelihood to Recommend

I think Hadoop has multiple flavors which people can customize to use as per their requirement. But I would choose hadoop based on following factors:
1. Number of nodes decision based on parallelism we want.
2. The module we want to run should be able to run parallely on all machine.

November 17, 2015

Wanna gain insight? Use Hadoop!

Sumant Murke

Cloud Software Engineer

Intel Corporation (Semiconductors, 10,001+ employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I have being using Hadoop for the last 12 months and really find it effective while dealing with large amounts of data. I have used Hadoop jointly with Apache Mahout for building a recommendation system and got amazing results. It was fast, reliable and easy to manage.

Pros and Cons

Fast. Prior to working with Hadoop I had many performance based issues where our system was very slow and took time. But after using Hadoop the performance was significantly increased.
Fault tolerant. The HDFS (Hadoop distributed file system) is good platform for working with large data sets and makes the system fault tolerant.
Scalable. As Hadoop can deal with structured and unstructured data it makes the system scalable.

Security. As it has to deal with a large data set it can be vulnerable to malicious data.
Less performance with smaller data. Doesn't provide effective results if the data is very small.
Requires a skilled person to handle the system.

Likelihood to Recommend

I would recommend Hadoop when a system is dealing with huge amount of data.

November 11, 2015

Advantage Hadoopo

Ajay Jha

Hadoop Architect/Lead

SPINS (Market Research, 51-200 employees)

Score 10 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.

Pros and Cons

Processes big volume of data using parallelism in faster manner.
No schema required. Hadoop can process any type of data.
Hadoop is horizontally scalable.
Hadoop is free.

Development tools are not that friendly.
Hard to find hadoop resources.

Likelihood to Recommend

Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.

August 19, 2015

Hadoop - You Can Tame the Elephant

Michael Reynolds

Cloudera Certified Hadoop Administrator and Developer (CCAH / CCDH)

Independent Contractor - Seeking Opportunities (Information Technology and Services, 1-10 employees)

Score 10 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

Hadoop is slowly taking the place of the company-wide MySQL data warehouse. Sqoop is being used to import the data from MySQL. Impala is gradually being used as the new data source for all queries. Eventually, MySQL will be phased out, and all data will go directly into Hadoop. Tests have shown that the queries run from Impala are much faster than those from MySQL

Pros and Cons

The built-in data block redundancy helps ensure that the data is safe. Hadoop also distributes the storage, processing, and memory, to work with large amounts of data in a shorter period of time, compared to a typical database system.
There are numerous ways to get at the data. The basic way is via the Java-based API, by submitting MapReduce jobs in Java. Hive works well for quick queries, using SQL, which are automatically submitted as MapReduce Jobs.
The web-based interface is great for monitoring and administering the cluster, because it can potentially be done from anywhere.
Impala is a very fast alternative to Hive. Unlike Hive, which submits queries as MapReduce jobs, Impala provides immediate access to the data.

If you are not familiar with Java and the operating system Hadoop rides on, such as Linux, and have trouble with submitted MapReduce jobs, the error messages can seem cryptic, and it can be challenging to track down the source of the problem.

Likelihood to Recommend

Hadoop is designed for huge data sets, which can save a lot of time with reading and processing data. However, the NameNode, which allocates the data blocks, is a single point of failure. Without a proper backup, or another NameNode ready to kick in, the file system can be become instantly useless. There are typically two ways to ensure the integrity of the NameNode.

One way is to have a Secondary NameNode, which periodically creates a copy of the file system image file. The process is called a "checkpoint". In the event of a failure of the Primary NameNode, the Secondary NameNode can be manually configured as the Primary NameNode. The need for manual intervention can cause delays and potentially other problems.

The second method is with a Standby NameNode. In this scenario, the same checkpoints are performed, however, in the event of a Primary NameNode failure, the Standby NameNode will immediately take the place of the Primary, preventing a disruption in service. This method requires additional services to be installed for it to operate.

March 20, 2015

Hadoop review

Pramod Deshmukh

Big Data / Hadoop Consultant

Consultant - Humedica (Information Technology and Services, 51-200 employees)

Score 8 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.

Pros and Cons

Streaming data and loading to HDFS
Load jobs using Oozie and Sqoop for exporting data.
Analytic queries using MapReduce, Spark and Hive

Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.

Likelihood to Recommend

OLTP is a scenario I think it is less appropriate. But future will be certainly different.

January 19, 2015

Hadoop >>>> Traditional proprietary Systems

Verified User

Engineer in Engineering

Information Technology and Services Company, 10,001+ employees

Score 10 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

My company's new cloud based architecture is Hadoop based . It is being used across several organizations in our company . Using Hadoop our company has been able to solve many big data problems faster with very high performance.

Pros and Cons

Cost Effective
Distributed and Fault Tolerant
Easily Scalable

Cluster management and debugging is kind of not user friendly ( Doesn't has many tools )
More focus should be given to Hadoop Security
Single Master Node
More user adoption ( Even though it is increasing by each day )

Likelihood to Recommend

Hadoop is best suited for processing and analyzing unstructured and huge volumes of data . So ask yourself if the problem you are trying to solve involves unstructured data and also the volume .

May 09, 2014

User Review of Hadoop

Andrea Krause

Hadoop Solution Architect

Sears Holdings Corporation (Retail, 10,001+ employees)

Score 10 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.

Pros and Cons

Gives developers and data analysts flexibility for sourcing, storing and handling large volumes of data.
Data redundancy and tunable MapReduce parameters to ensure jobs complete in the event of hardware failure.
Adding capacity is seamless.

Logs that are easier to read.

Likelihood to Recommend

Not an RDBMS - not well suited for traditional BI applications.

Return to navigation

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

HPE Ezmeral Data Fabric (MapR)

PostgreSQL

Google BigQuery

ClickHouse

Apache Spark

Amazon EMR

IBM Analytics Engine

Azure HDInsight

Hortonworks Data Platform

Apache Pig

Db2 Big SQL

Cloudera Manager

Community Insights